feat: hhtl_cascade_search score_fn callback — LEAF level pluggable score_fn(row, col) -> f32 replaces the hardcoded 1.0 placeholder. lance-graph passes LanceDB VectorSearch as the LEAF backend. ndarray provides the cascade logic. lance-graph provides the data. 23 p64 tests passing. https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK#73
Merged
Conversation
Base17::l1(): 16 of 17 dims via I32x16 (sub, abs, reduce_sum), 17th scalar. Base17::l1_weighted(): same + I32x16 multiply for PCDVQ weights [20,3,3,3,3,3,1,1,1,1,1,1,1,1,1,1]. Non-x86 fallback preserved (scalar loop). Before: 611M lookups/sec, 1.8 ns/lookup, 17K tokens/sec After: 719M lookups/sec, 1.4 ns/lookup, 22K tokens/sec https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
All Base17 hot-path ops now SIMD: l1() → I32x16 sub + abs + reduce_sum l1_weighted() → I32x16 sub + abs + mul + reduce_sum sign_agreement() → I32x16 xor + simd_min + count xor_bind() → I32x16 xor .cargo/config.toml: target-cpu=native (not x86-64-v4) → GitHub CI gets AVX2/SSE4.2 fallback automatically → Local dev gets AVX-512 if available → cfg(target_feature = "avx512f") handles compile-time dispatch 728M lookups/sec, 22K tokens/sec. 19 tests passing. https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
GitHub CI: override with CARGO_BUILD_RUSTFLAGS="-C target-cpu=x86-64-v3" https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
Per-function #[target_feature(enable = "avx512f")] / "avx2". LazyLock runtime detection, one binary for all ISAs. l1_avx512: _mm512_cvtepi16_epi32 + _mm512_sub + _mm512_abs + reduce_add l1_avx2: _mm256_cvtepi16_epi32 + _mm256_sub + _mm256_abs + horizontal sum l1_scalar: for i in 0..17 (non-x86 fallback) 605M lookups/sec (LazyLock) vs 728M (hardcoded AVX-512). 19 tests passing. .cargo/config.toml: no global target-cpu. https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
…azyLock 4 functions converted to multi-versioned kernels: l1_weighted: I32x16 mul(abs_diff, weights) + reduce_sum sign_agreement: I32x16 xor + cmpge_mask + count_ones xor_bind: I32x16 xor + cvtepi32_epi16 pack-back inject_noise: I32x16 add(dims, prng_noise) + clamp Pattern: #[target_feature(enable = "avx512f")] per-function, LazyLock runtime detection, one binary serves all ISAs. No global target-cpu in .cargo/config.toml. CI (AVX2) and Production (AVX-512) use same binary. 629M lookups/sec, 19K tokens/sec, 19 tests passing. https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
p64 multi-versioned kernels (AVX-512/AVX2/scalar via LazyLock): attend(): 8 rows/iter via _mm512_and_si512 + scalar popcnt nearest_k(): 8 XORs/iter via _mm512_xor_si512 moe_gate(): all 8 planes in one zmm register palette_distance nearest(): 4-way unrolled loop, inner l1() already SIMD-dispatched All scalar loops from the audit now have SIMD versions: bgz17_bridge: l1, l1_weighted, sign_agreement, xor_bind, inject_noise palette_distance: nearest (4-way unroll) p64: attend, nearest_k, moe_gate 78 tests passing. 695M lookups/sec. 21K tokens/sec. One universal binary — LazyLock runtime detects AVX-512/AVX2. https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
score_fn(row, col) -> f32 replaces the hardcoded 1.0 placeholder. lance-graph passes LanceDB VectorSearch as the LEAF backend. ndarray provides the cascade logic. lance-graph provides the data. 23 p64 tests passing. https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
AdaWorldAPI
pushed a commit
that referenced
this pull request
Apr 19, 2026
2 tasks
AdaWorldAPI
pushed a commit
that referenced
this pull request
May 16, 2026
…only)
`tests/par_azip.rs` has `use itertools::{assert_equal, cloned, enumerate}`
under `#[cfg(feature = "approx")]`, but `test_par_azip9` called
`assert_equal(cloned(&a), x)` at line 85 unconditionally. With approx OFF
the import is excluded and compile fails with E0425.
Latent for as long as the file existed (since PR #73 era); never surfaced
because no CI matrix combination ran `--features rayon` without approx.
W-I4's new `hpc-stream-parallel` job exercises exactly that combination
and tripped the failure.
Fix: replace `assert_equal(cloned(&a), x)` with `assert_eq!(a, x)` —
both `a` and `x` are `Array<i32, _>` and the file's other tests already
use direct `assert_eq!`. Trim the now-dead `assert_equal, cloned` from
the still-needed (test_par_azip3) `enumerate` import.
Verified clean compile + 6/6 tests pass under both:
cargo test --features rayon --test par_azip
cargo test --features "rayon approx" --test par_azip
https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.